EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis

نویسندگان

  • Sophia S. F. Lee
  • Lei Sun
  • Rafal Kustra
  • Shelley B. Bull
چکیده

MOTIVATION We developed an EM-random forest (EMRF) for Haseman-Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data. RESULTS Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman-Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs. AVAILABILITY The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data.

A novel fine structure mapping method for quantitative traits is presented. It is based on Bayesian modeling and inference, treating the number of quantitative trait loci (QTLs) as an unobserved random variable and using ideas similar to composite interval mapping to account for the effects of QTLs in other chromosomes. The method is introduced for inbred lines and it can be applied also in sit...

متن کامل

ارزیابی صحت پیش‌بینی ژنومی در معماری‌های مختلف ژنومی صفات کمی و آستانه‌ای با جانهی داده‌های ژنومی شبیه‌سازی‌شده، توسط روش جنگل تصادفی

Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...

متن کامل

Linkage analysis of microsatellite markers on chromosome 5 in an F2 population of Japanese quail to identify quantitative trait loci affecting carcass traits

An F2 Japanese quail population was developed by crossing two strains (wild and white) to map quantitative trait loci (QTL) for performance and carcass traits. A total of 472 F2 birds were reared and slaughtered at 42 days of age. Performance and carcass traits were measured on all of the F2 individuals. Parental (P0), F1 and F2 individuals were genotyped with 3 microsatellites from quail chrom...

متن کامل

Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies

Genome-wide association study (GWAS) entails examining a large number of single nucleotide polymorphisms (SNPs) in a limited sample with hundreds of individuals, implying a variable selection problem in the high dimensional dataset. Although many single-locus GWAS approaches under polygenic background and population structure controls have been widely used, some significant loci fail to be dete...

متن کامل

Estimating linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations.

Positional cloning of gene(s) underlying a complex trait requires a high-resolution linkage map between the trait locus and genetic marker loci. Recent research has shown that this may be achieved through appropriately modeling and screening linkage disequilibrium between the candidate marker locus and the major trait locus. A quantitative genetics model was developed in the present study to es...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2008